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ABSTRACT 

We  report  our  experience  with  an  application  of  the  new  modified  Cholesky  Factorization  of  Schnabel 
&i  Eskow  in  the  context  of  nonlinear  optimization.  We  have  implemented  a  modified  version  of  the  new 
factorization  for  sparse  symmetric  linear  systems,  without  pivoting,  and  have  incorporated  it  into  the 
framework  of  a  large-scale  truncated  Newton  minimization  package.  In  our  truncated  Newton  algorithm, 
we  use  the  preconditioned  linear  conjugate  gradient  method  to  solve  iteratively  and  approximately  for  the 
Newton  search  direction.  The  MCF  is  used  here  to  guarantee  that  the  effective  preconditioner  is  positive 
definite.  Details  of  the  implementation  are  provided,  and  performance  comparisons  with  the  Gill,  Murray 
and  Wright  modified  Cholesky  factorization  for  sample  problems  are  reported.  Our  preliminary  results 
demonstrate  that  the  two  modified  Cholesky  factorizations  perform  quite  differently  in  practice  in  our 
truncated  Newton  context.  A  clear  difference  in  timing  of  the  modification  (i.e.,  first  nonzero  increment), 
in  addition  to  differences  in  numerical  values,  suggests  that  the  Schnabel  &:  Eskow  strategy  may  work 
better  for  highly  indefinite  systems  while  the  Gill,  Murray  and  Wright  strategy  may  be  more  effective 
for  nearly  positive  definite  systems.  As  a  consequence,  the  new  factorization  may  be  especially  useful  for 
minimization  of  highly  nonlinear  functions  by  Newton  methods. 
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Implementation  of  the  Schnabel  &  Eskow  Modified 

Cholesky  Factorization  in  the  Context  of 

a  Truncated-Newton  Optimization  Method 

I.  INTRODUCTION 

The  purpose  of  this  communication  is  to  report  our  experience  with  an  appHcation 
of  the  new  modified  Cholesky  Factorization  (MCF)  of  Schnabel  Sz  Eskow  [1,2]  in  the  con- 
text of  our  nonlineair  optimization  package.  We  have  implemented  a  modified  version  of 
the  new  factorization  for  large  sparse  symmetric  linear  systems,  without  pivoting,  and 
have  incorporated  it  into  the  framework  of  a  large-scale  truncated  Newton  minimization 
algorithm  [3].  In  our  truncated  Newton  algorithm,  we  approximate  the  solution  to  the 
Newton  equations  at  every  step  by  using  an  iterative,  linear  Conjugate  Gradient  method 
(CG).  The  CG  loop  is  "truncated"  when  either:  1)  a  suitably  chosen  truncation  criterion 
is  satisfied,  or  2)  indefiniteness  is  detected,  in  which  case  a  guaranteed  direction  of  descen- 
t  is  still  produced.  The  CG  loop  is  accelerated  through  preconditioning.  To  guarantee 
that  the  effective  preconditioner  —  often  chosen  from  the  physics  of  the  problem  —  is 
positive  definite,  we  use  the  modified  Cholesky  factorization.  In  this  report,  details  of  the 
implementation  will  be  provided,  and  performance  comparisons  with  the  Gill,  Murray  and 
Wright  MCF  [4]  for  sample  computationeil  chemistry  problems  and  the  trigonometric  func- 
tion will  be  reported.  Results  demonstrate  that  the  two  modified  Cholesky  factorizations 
perform  quite  differently  in  practice  and  that  their  relative  performance  is  sensitive  to 
problem  chairacteristics  related  to  indefiniteness.  A  clear  difference  in  timing  of  modifica- 
tions (i.e.,  first  nonzero  increment)  suggests  that  the  Schnabel  Sz  Eskow  strategy  may  work 
better  for  laxge,  highly  indefinite  systems  —  especially  with  clustered  negative  eigenvalues 
—  while  the  Gill,  Murray  &  Wright  strategy  may  be  more  effective  for  nearly  positive  def- 
inite systems.  Thus,  the  Schnabel  Sz  Eskow  factorization  may  be  paxticulEirly  important 
for  minimization  of  highly  nonlinear  functions  by  Newton  methods. 


II.  THE  MODIFIED  CHOLESKY  FACTORIZATION 

The  recently-reported  MCF  of  Schnabel  &  Eskow  (henceforth  denoted  as  Si:E) 
presents  an  alternative  to  the  Gill,  Murray,  and  Wright  MCF  (GMW).  The  GMW  factor- 
ization has  been  used  extensively  to  solve  linear  systems  Mz  =  r  for  coefficient  matrices  M 
that  are  symmetric  but  not  necessarily  positive  definite.  Such  factorizations  are  particu- 
larly appropriate  in  the  context  of  nonlinejir  optimization,  and  this  topic  will  be  addressed 
in  the  next  section.  Effectively,  a  diagonal  matrix  E  of  nonnegative  elements  is  added  to 
M,  Eind  the  sum  M  +  E  is  factored  as  M  +  E  =  LDL^  where  D  is  diagonal  and  L  is  unit 
lower- triangular.  (M,  E,  L,  and  D  are  all  n  x  n  matrices). 

Two  important  questions  in  these  modified  factorizations  axe  the  following:  1)  how 
do  we  perform  the  MCF  in  a  numerically-stable  manner  without  knowing  the  complete 
eigenvalue  distribution  of  M  a  priori?  and  2)  how  do  we  choose  the  appropriate  numerical 
values  for  the  entries  of  E?  The  first  issue  is  important  because  the  Cholesky  factors  may 
be  ill-conditioned  and  may  not  exist  for  an  indefinite  system;  thus,  the  stzindard  process 
must  be  amended  appropriately.  PYirthermore,  the  algorithm  should  be  formulated  so 
as  to  keep  the  computational  cost  as  close  as  possible  to  that  of  the  standard  Cholesky 
factorization.  In  regard  to  the  second  question  of  constructing  E,  we  must  generate  a 
clever  recipe  for  choosing  the  increments  tj,  j  =  l,...,n,  so  as  to  bcdance:  1)  on  one 
extreme,  selecting  large  modifications  that  guarantee  positive  definiteness  but  perturb  the 
original  matrix  excessively;  with  2)  on  the  other  extreme,  choosing  small,  just  sufficient, 
increments  that  may  lead  later  to  very  large  increments. 

The  new  S&E  is  attractive  because  it  possesses  a  lower  a  priori  upper  bound  on  the  size 
oi  E  —  measured  as  ||£^||oo  —  than  the  GMW  factorization  [1,2].  Its  cost  is  comparable  to 
that  of  GMW,  involving  an  additional  multiple  of  n^  operations  to  the  standard  Cholesky 
factorization.  Furthermore,  ntmaerical  experiments  with  S&zE  have  suggested  that  the 
resulting  H^^Hoo  may  indeed  be  smaller  thcin  GMW  in  practice  [1]. 
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Full  details  of  the  two  factorizations  are  provided  in  the  original  papers.  Below  we  only 
summarize  the  two  different  procedures  for  choosing  the  elements  of  E.  While  pivoting 
strategies  are  also  different  for  the  two  factorizations,  they  will  not  be  addressed  here  since 
we  are  interested  in  a  no-pivoting  version.  It  is  worth  mentioning,  however,  that  while 
pivoting  does  not  aiffect  the  theoretical  upper  bounds  on  ||£'||ooi  it  may  produce  better 
results  in  practice. 

Consider  step  j  of  a  row-wise,  no-pivoting,  modified  Cholesky  factorization  (see  Fig- 
ure 1).  We  adopt  the  convention  that  computed  elements  of  L  and  D  are  overwritten  in 
the  storage  arrays  of  the  original  A/  and  that  the  diagonal  array  of  D  has  been  initialized 
at  step  0  as  dj  *—  rrijj,  j  =  1, . . . ,  n.  As  step  j  begins,  the  upper  [j  —  I)  by  {j  —  I)  submatrix 
conteiins  the  final  values  for  the  elements  of  L  and  D.  Row  j  up  to  the  diagonal  contains 
the  computed  "auxiliaxy"  quantities  Cj,  =  (.jgd,,  s  =  l,...,j  —  1.  Similarly,  rows  j  +  1 
through  n  up  to  column  j  contain  their  auxiliaxy  values  c.  The  candidate  for  dj  has  been 
computed  at  the  end  of  step  j  —  I.  The  current  step  _;'  of  the  MCF  consists  of  the  following 
substeps: 

(a)  Update  row  j: 

£j,=c.,Jd,,      s  =  l,...,j-l.  (1) 

(b)  Compute  the  auxiliary  quantities  c  in  column  j  below  the  diagonal,  and  set 
9  to  some  functional  value  (to  be  discussed  below)  of  these  quantities: 

j-i 
Cs]  =  m,j  -^(.jkCsk  ,      5  =  ;  +  1, . . .  ,  n  ,  (2) 

k=\ 

9  =  f(Cj)   ,        Cj   =  {Cj  +  ij,Cj+2,j,  .  .  .  Cn,jf  .  (3) 

(c)  Modify  dj  if  necesseiry  by  adding  a  quantity  Cj  specified  by  a  given  recipe 
(further  details  are  provided  later): 

d,  ^  d,  +  ej  .  (4) 
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(d)  Update  the  prospective  diagonal  elements: 

d^  i- ds  -  clj  /  dj  ,      s  =  J  +  l,....n  .  (5) 

The  deteiils  of  substeps  (b)-(d)  above  will  now  be  discussed  in  turn  for  the  GMW  and  Si:E 
factorizations. 

(A)  The  GMW  Factorization 

In  GMW,  the  values  for  Cj  are  chosen  so  as  to  minimize  an  a  priori  upper  bound  on 
||£'||oo  subject  to  the  condition  that  positive  definite  matrices  will  not  be  perturbed  (i.e., 
M  positive  definite  =>  £■  =  0). 

First,  the  method  requires  as  input  the  values  of  two  small  numbers:  €m  —  machine 
precision,  and  r  —  the  lowest  acceptable  value  for  dj  +'  ej,  typically  of  order  of  square 
or  cubic  root  of  €„■  Second,  the  following  key  psirameters  must  be  computed  before  the 
factorization  begins: 

7  =  max{  |m_,j|  }  (6) 


and  the  bound 


^  =  max{  |m_„|  }  (7) 


/3  =  max{7, — .Cm)  •  (8) 

maxjl,  vn  —  1) 


In  substep  (b)  of  the  MCF  as  sketched  above,  the  GMW  function  is:  /ga/u'(c)  = 
||c||oo-  Thus,  9  contains  the  largest  magnitude  among  the  c,j,  5  =  j  +  1, . . . ,  n.  In  substep 
(c)  of  the  MCF,  the  GMW  recipe  for  incrementing  dj  defines  Cj  impUcitly  through  the 
formula: 

dj^max{\dj\  ,  r  ,  e'/0}  .  (9) 

The  following  points  are  worth  noting.  First,  the  matrix  M  must  be  assembled  a  priori 
for  GMW  so  that  7  eind  ^  can  be  computed.   Thus,  some  modifications  may  be  required 


in  some  IsLrge-scaJe  applications  if,  tj-pically.  the  entries  of  M  are  assembled  row-by-row 
as  the  factorization  proceeds.  Second,  even  when  the  candidate  for  dj  is  greater  than  r, 
the  value  may  be  modified  if  it  is  less  than  6'^ /3.  This  situation  can  occur  when  M  is 
not  sufficiently  positive  definite.  If  the  relation  r  <  9'^ / 3  <  \dj\  holds,  then  Cj  =  2\dj\. 
Our  experience  has  shown  that  for  indefinite  systems  this  is  often  the  resulting  increment. 
Third,  the  restriction  that  dj  >  6'^  / 3  impHes  that 

ds\\el,\\oo<?  .  (10) 

Hence  the  name  "bound"  for  Q. 

(B)  The  S&E  Factorization 

The  S<kE  recipe  for  adding  the  e^'s  is  not  appHed  in  a  straiightforward  manner  —  at 
every  step  j  —  as  in  GMW.  Instead,  a  two-phase  strategy  for  the  entire  factorization  is 
used.  Phase  1  denotes  the  steindard  Cholesky  factorization  in  which  all  increments  are 
zero.  Phase  1  ends  when  the  diagoned  test  —  see  below  —  indicates  that  some  prospective 
diagonal  in  the  remaining  submatrix  will  become  vmreasonably  small.  At  this  point,  the 
cdgorithm  switches  to  Phase  2,  and  the  S<kE  recipe  for  the  increments  is  applied  from  this 
step  forward. 

Two  small  pcirameters  must  be  prescribed  by  the  user  for  the  S«SjE  factorization:   rj 

and  T2,  typically  around  the  same  magnitude  as  the  cubic  root  of  machine  precision.  Now 

recedl  that  the  candidate  for  dj,  as  well  as  for  the  updated  diagonsd  elements  {djj^\,  . . ., 

dn}  in  the  remaining  {n  —  j  -\-  1)  by  [n  —  j  -\-  1)  submatrix  are  computed  at  step  j  —  1, 

substep  (d),  of  the  MCF  (eq.  (5)).    In  S&:E,  while  we  are  still  in  Phase  1,  the  phase  test 

sets  dmm  at  step  _;  to 

dmtn  *-  nim{ds}  (11) 

»>} 

and  tests  whether 

dmin  <  ri7  .  (12) 
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The  parameter  7  is  the  same  for  GMW,  defined  by  eq.  (6).    If  eq.  (12)  holds,  the  SfcE 
algorithm  switches  to  Phase  2,  and  the  Cj  recipe  is  applied. 

The  function  /  in  substep  (b)  of  the  MCF  (eq.  (3))  is  the  1-norm  for  S&E  instead  of 
the  00-norm  as  in  GMW:  fsksic)  =  ||c||i  =  I3"=j+i  |c»jl-  Thus,  9  =  fsksic)  contains 
the  Gerschgorin  radius  for  row  j  at  the  current  step  of  the  factorization.  The  recipe  for  Cj 
is  then  given  by: 

Cj  =  maxjO  ,  -dj  +  max{^  ,  T2f]  ,  Cj-i}  ,  (13) 

where  ey-i  is  the  increment  at  step  j  —  I- 

For  the  last  two  steps  (which  determine  e„  and  e„_i),  Schnabel  &:  Eskow  modify  their 
recipe  above  by  using  eigenvalues  of  the  finzd  2x2  submatrix  directly.    If  \(o  and  \hi 


denote  the  eigenvalues  of  the  remaining  submatrix 
n  —  2,  then 


"n  — 1         Cn,n  — 1 
Cfi.Tj  — 1  "n 


at  the  end  of  step 


Cn-i  =  Cn  =  max 


I  0  ,  e„_2  ,  -\to  +  T2   *  msx{——^{\hi  -  A/o)  ,  7}  p 


(14) 


The  reasoning  behind  this  Gerschgorin  strategy  in  S&E  is  that  \\E\\oo  can  be  shown  to 
be  bounded  above  by  the  magnitude  of  the  most  negative  lower  Gerschgorin  bound  of  the 
submatrix  remaining  at  the  step  where  Phase  2  is  entered.  In  combination  with  an  upper 
bound  for  Phase  1,  this  result  produces  a  lower  upper  bound  on  ||£'||oo  for  S.SjE  than  for 
GMW. 


III.  THE  MCF  IN  THE  CONTEXT  OF  OPTIMIZATION 

Modified  Cholesky  factorizations  Eire  paxticulaxly  useful  in  the  context  of  nonlinear 
optimization.  In  iterative  Newton-type  methods  for  minimizing  F(x),  x  G  R",  a  sequence 
of  vectors  {xo,Xi,.  ..}  is  generated  by  the  rule  Xjt+i  =  x^  -I-  A^pfc.  The  vector  pk  is 
a  search  direction,  and  the  scalar  A/t  >  0  is  a  step  length  that  ensures  sufficient  function 
decrease.  The  vector  pjt  can  be  obtzdned  by  solving  the  Newton  equations  for  p, 

HkP  =  -gfc  ,  (15) 

where  H  and  g  axe  the  Hessian  and  gradient  vector,  respectively,  of  £J  at  Xfc. 

In  the  "modified  Newton"  framework,  developed  to  work  in  practice  when  a  local 
quadratic  approximation  to  F  may  be  poor,  a  positive  definite  approximation  to  Hk  may 
replace  Hk  in  eq.  (15).  Fiirthermore,  in  the  truncated  Newton  variation,  the  Newton  (or 
modified  Newton)  equations  may  only  be  solved  approximately  [5,6].  The  rationale  behind 
this  approximation  is  that  it  is  necessary  to  obtain  an  accurate  Newton  seaxch  direction 
only  near  a  minimum;  far  away  from  such  regions,  any  descent  direction  will  suffice,  so 
the  cost  for  exact  solution  is  unwaxranted.  Indeed,  it  hcis  been  shown  in  theory  and  in 
practice  [5,3,7]  that  overall  convergence  is  not  sacrificed  if  the  truncation  criterion  is  chosen 
to  ensure  that,  as  a  critical  point  of  F  is  approached,  the  solution  for  pk  becomes  more 
accurate.  For  example,  the  approximation  (to  the  Newton  seaxch  direction  at  step  k)  can 
be  controlled  by  a  paxcimeter  77 jt,  set  to 

T]k  =mm{6/k  ,  llgili}  (16) 

where  S  <  1,  and  then  the  residual  is  computed  to  satisfy 

\\HkPk+gk\\<rik\\gk\\  .  (17) 

From  the  discussion  above,  one  obvious  utility  for  the  MCF  appears:  modify  Hk 
when  it  is  not  positive  definite  as  Hk  *—  Hk  +  Ek  in  eqs.  (15)  or  (17).   The  less  obvious 

7 


utility  for  the  MCF  in  our  modified  Newton  framework  arises  in  the  truncated  Newton 
implementation. 

Truncated  Newton  methods  lead  to  a  nested  iteration  structure:  outer  loop  for  x^, 
inner  loop  for  p^.  An  iterative,  rather  than  direct,  procedure  must  be  used  in  the  inner 
loop  to  edlow  truncation  of  the  solution  process  for  p^.  One  attractive  method  for  large- 
scale  problems  is  the  linear  preconditioned  Conjugate  Gradient  method  (PCG)  [8],  which 
must  be  used  with  adaptations  to  indefinite  systems  [3,5].  Preconditioning  is  important  for 
accelerating  convergence,  but  it  adds  yet  another  linear  system,  which  must  be  solved  at 
every  inner  iteration:  A/z  =  r,  where  A/  is  the  preconditioner  of  if,  and  z  and  r  are  vec- 
tors. Typically,  the  benefit  in  performance  from  preconditioning  outweighs  the  additional 
complexity  involved.  Now,  if  M  is  not  guaranteed  to  be  positive  definite,  the  MCF  can  be 
used  to  modify  A/asAf<— Af-l-f^so  that  the  "effective  preconditioner"  is  positive  definite. 
This  context  is  appropriate  in  majiy  large-scale  applications  that  arise  in  computational 
chemistry,  mathematical  biology,  or  meteorology,  where  a  spairse  preconditioner  can  often 
be  extracted  from  the  "physics"  of  the  problem.  TNPACK  implements  the  MCF  in  this 
way,  and  further  deteiils  of  the  algorithm  follow  in  the  next  section. 

Whether  the  MCF  is  used  to  modify  the  Hessian  or  the  preconditioner  in  the  modified 
Newton  fraimework,  the  following  questions  arise  regarding  implementation  and  testing  of 
the  GMW  and  S&E  factorizations  in  this  minimization  context: 

(1)  Can  both  factorizations  be  implemented  effectively  in  an  optimization  code  in 
analogous  programming  style  —  so  that  both  factorizations  can  be  exercised? 

(2)  In  paxticular,  can  the  two  factorizations  be  implemented  efficiently  in  large-scale 
minimization  codes  where  problem  structure  and  size  mandate  special  storage  and  algo- 
rithmic considerations? 

(3)  On  average  over  the  number  of  iterations,  will  ||£'||oo  be  lower  for  S&E  than  for 
GMW? 

(4)  Whatever  the  outcome  from  question  (3)  above,  how  will  the  size  of  HEHoo  affect 
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the  overall  progress  toward  a  local  minimum? 

In  the  next  section,  we  summarize  some  implementation  details,  and  in  the  following 
section  we  describe  several  numerical  experiments. 

IV.  IMPLEMENTATION  OF  THE  MCF  IN  TNPACK 

Our  recently- prepared  package  TNPACK  [3]  uses  a  large  scale  truncated- Newton  algo- 
rithm for  unconstrained  nonlinear  optimization.  As  described  above,  truncation  is  accom- 
plished by  using  PCG  to  solve  approximately  the  Newton  equations  so  that  eqs.  (16)  and 
(17)  are  satisfied.  Since  PCG  was  originally  developed  for  positive  definite  systems,  adap- 
tations for  the  indefinite  case  must  be  made  in  the  present  context.  This  involves  halting 
a  PCG  iteration  when  a  direction  of  negative  curvature  (d)  is  detected:  d^Hkd  <  Cd^d 
where  (^  is  a  tolereince  pEirameter.  In  this  event,  we  exit  the  inner  PCG  loop  with  a  search 
direction  that  is  always  a  direction  of  descent  [3,5].  In  brief,  if  this  occurs  at  the  first  PCG 
iteration,  pt  is  set  to  —M~^gk  (— gfc  is  another  possibility);  otherwise,  pjt  is  set  to  the 
current  p  in  the  inner  iteration.  (Other  choices  after  the  first  iteration,  such  as  the  current 
d  or  the  steepest  descent  direction,  — gk,  cam  be  selected  by  the  user). 

Next,  we  have  developed  a  procedure  for  implementing  PCG  for  large-scale  prob- 
lems where  the  preconditioner  M  is  large  and  sparse.  Recall  that  CG  methods  require 
Hessian/vector  multiplications  (Hd)  at  every  step,  and  that  PCG  algorithms  require,  in 
addition,  solution  of  a  linear  system  A/z  =  r  at  every  step.  Thus  M  remains  constant 
throughout  the  inner  PCG  loop  of  Newton  step  k,  but  several  solutions  are  required  for 
different  right -hand- side  vectors  r.  In  TNPACK,  we  solve  these  linear  sytems  by  a  sparse 
MCF. 

Our  sparse  factorization  is  based  on  the  Yale  Sparse  Matrix  Package,  YSMP  [9-11]. 
This  package  was  developed  to  solve  large,  sparse  systems  efficiently  by  Gaussian  elim- 
ination. For  positive  definite  systems,  efficiency  is  accomplished  by  1)  using  compact 
row-by-row  storage  schemes  for  M  and  its  Cholesky  factors,  2)  reordering  the  rows  and 
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columns  of  M  in  positions  which  were  zeros  for  M ,  3)  performing  arithmetic  on  the  nonzero 
elements  only,  and  4)  employing  a  modular  solution  process.  The  modularity  is  accom- 
plished through  3  separate  stages  —  symbolic  factorization,  numerical  factorization,  and 
numerical  solution  —  and  allows  efficiency  in  solving  several  related  systems  (e.g.,  same 
Af ,  same  spaxsity  structure).  We  have  modified  two  YSMP  routines  for  TNPACK  to  han- 
dle matrices  that  are  symmetric  but  not  necessarily  positive  definite  by  incorporating  the 
MCF.  Thus,  although  our  target  problems  axe  laxge  and  their  associated  Hessians  may  be 
dense,  the  preconditioners  are  typically  sparse  and  the  YSMP  is  efficient  with  its  compact 
storage  ajid  computational  schemes. 

(A)  No-Pivoting  Strategies 

In  TNPACK,  reordering  of  the  variables  is  provided  as  an  option.  Reordering  is  used 
to  minimize  fill-in  of  the  Cholesky  factors  in  M  ajid  is  performed  only  once,  before  the 
factorization  of  M  begins.  In  many  large-sczde  optimization  problems,  a  sparse  Hessian  or 
a  sparse  approximation  to  the  Hessian  is  involved.  Often,  sparse  preconditioners  can  be 
easily  formulated  from  the  separability  structure  of  the  Hessian  in  question.  For  example, 
in  image  processing  applications,  neighboring  pixels  can  be  assembled.  In  computational 
chemistry,  energy  components  from  the  "local"  interactions  —  bond-length  and  bond- 
zuigle  potentials  —  can  be  included  [12,13].  (The  remaining  terms  are  electrostatic  and 
Van  der  Wa.als  and  are  computed  over  all  atom  pziirs.)  Moreover,  in  these  physical  ap- 
plications, the  pattern  of  M  often  remeuns  the  same  throughout  the  entire  optimization 
process.  Thus,  reordering  of  the  variables  can  be  performed  only  once,  at  the  beginning 
of  the  minimization  algorithm.  Every  subsequent  Newton  step  will  then  compute  different 
numerical  values  for  M,  and  the  MCF  without  pivoting  will  be  used  to  factor  M  (once) 
and  then  solve  for  several  different  right-hand-side  vectors  in  each  inner  PCG  iteration. 
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(B)  The  Phase-Test  Strategy 

When  no-pivoting  is  used  in  MCF,  two  sets  of  different  input  parajneters,  values  of 
9,  and  recipes  for  setting  the  e^'s  for  the  two  factorizations  can  easily  be  programmed 
in  FORTRAN  IF-THEN-ELSE  segments  (so  that  both  factorizations  can  be  exercised). 
Less  straightforward  is  implementing  the  phcise  test  of  S&rE.  If  an  existing  MCF  algorithm 
is  written  as  in  Section  II  —  with  prospective  diagonals  updated  at  every  step  of  the 
elimination  —  the  phase  test  is  easy  to  implement.  If,  on  the  other  hand,  the  code  does 
not  update  the  diagonals  at  every  step  and  instead  computes  each  candidate  dj  at  step  j, 
substep  (a)  of  the  code  must  be  modified  as: 

7-1  ' 

d,^rn,,-Y,^],ds.  (18) 

For  large  linear  systems,  the  requirement  to  update  the  diagonal  elements  at  every  step 
means,  of  course,  that  the  entire  diagonal  eirray  be  available  before  the  MCF  begins.  Thus, 
codes  from  applications,  such  as  multifrontcd  methods,  that  normally  assemble  matrix 
entries  row-by-row  as  the  elimination  proceeds,  must  be  modified  to  assemble  all  diagonals 
at  the  start.  (Off-diagonals  may  still  be  computed  progressively).  The  diagonal  array  is 
also  required  to  compute  7  (eq.  (6)).  Note  that  GMW  requires,  in  addition,  that  the  off- 
diagonal  array  be  available  at  the  start  of  the  factorization  in  order  to  compute  ^  (eq.  (7)). 

(C)  The  Recipe  Strategy 

A  remaining  issue  to  be  addressed  in  implementing  Si:E  in  the  context  of  large-scale 
nonline^Lr  optimization  is  the  Gerschgorin-based  formula  for  setting  the  e^'s.  Schnabel  ^ 
Eskow  propose  two  modifications  which  do  not  change  the  theoretical  bounds  on  ||£'||oo  but 
may,  in  fact,  lead  to  better  results  in  practice.  The  first  modification  is  eq.  (14)  —  using 
actual  eigenvalues  of  the  remaining  2x2  submatrix  to  determine  e„  and  e„_i .  The  second 
modification  is  including  ej-\  in  eq.  (13)  for  tj.  The  reasoning  behind  this  strategy  is  that 
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requiring  ej  >  ej-\  will  not  increase  ||£'||oo  but  may,  in  fact,  lead  to  smaller  increments  at 
later  Cholesky  steps  [1,2]. 

Our  experience  suggests  (see  next  section)  that  including  ej_i  in  the  formula  for  Cj 
does  not  always  improve  the  overall  performance  in  optimization.  While  ||£'||oo  may  indeed 
be  larger  if  the  condition  ej  >  ej-i  is  not  imposed,  this  outcome  is  balanced  by  the  fact 
that  fewer  overall  modifications  to  A/  are  involved.  Moreover,  for  very  large  sparse  systems, 
there  is  an  additional  argument  for  omitting  this  condition:  if  ij+i,j  or  Cj^-i^  is  zero,  a 
larger  dj  will  not  lead  to  a  more  negative  dj+i  and  hence  to  a  larger  e^+i  —  see  eq.  (5). 

The  second  modification  of  the  recipe  at  the  last  two  steps  may  have  a  small  improve- 
ment, if  any,  on  ||£^||oo  for  large-scale  systems.  Furthermore,  this  special  treatment  of  the 
last  two  variables  may  destroy  some  inherent  symmetry  of  the  problem.  For  example,  if 
an  objective  function  suggests  the  relation  Xj  =  Xj^2^  j  =  1,2, . . .,  it  seems  reasonable  to 
allow  an  analogous  symmetry  in  the  components  of  the  PCG  vectors  (which,  in  turn,  will 
lead  to  symmetry  in  the  search  direction). 

V.  NUMERICAL  EXPERIMENTS 

The  numerical  examples  we  discuss  here  are  meant  to  suggest  various  trends  and 
possibilities  that  may  occur  in  practice  in  the  context  of  optimization  when  the  two  different 
MCF  strategies  are  implemented.  It  is  clear  that  the  optimality  question  of  one  strategy 
versus  another  is  very  difficult,  because  the  overall  effect  on  minimization  is  cumulative. 
For  this  reason,  general  conclusions  may  only  be  reached  after  thorough  investigations  on 
a  wide  variety  of  problems.  Nonetheless,  our  current  experience  may  already  prove  useful 
to  investigators  and  we  share  it  in  hope  that  further  studies  will  be  motivated. 

We  study  in  detail  the  effect  of  the  different  MCF  strategies  for  three  types  of  objective 
functions  with  variations  of  size,  starting  points,  parameters,  and  preconditioners.  We  use 
TNPACK  with  the  GMW  and  S&E  factorizations  implemented  as  discussed  previously. 
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Three  versions  of  MCF  were  implemented:  GMW;  S&E  (a)  —  with  the  condition  C;  >  ej_i 
imposed;  and  S&E  (b)  —  with  this  condition  omitted.  The  tolerance  parameters  were  set 
to  n  =  T2  =  l.OE-05  for  S&E,  and  r  =  l.OE-08  *max{l,7,0  for  GMW.  This  value  of  r 
for  GMW  was  chosen  since  our  typical  value  for  7  (7  >  0  in  the  computational  chemistry 
problems  has  magnitude  of  O(IO^).  The  convergence  criterion  to  reach  a  minimum  was 
set  to  ||g||  <  l.OE-08  *  max{l,  ||x||}  where  ||  •  ||  denotes  the  standard  Euchdean  norm. 

Two  functions  are  chosen  from  computational  chemistry:  one  representing  a  typical 
case  of  a  molecule  where  distinct  and  well-separated  conformational  regions  exist  (due 
to  steric  hindr2mce),  and  one  of  a  cluster  of  molecules,  possessing  a  very  large  number 
of  low-energy  configurations  —  which  increase  with  size  —  due  to  a  complex  continuum 
of  architectural  possibilities.  For  both  examples,  natural  preconditioners  arise  from  the 
separabihty  of  the  energy  function  into  local  and  nonlocal  forms  (details  follow)  [12-14]. 
Our  third  problem  is  the  trigonometric  function  [15],  appropriate  here  due  to  the  following 
characteristics:  the  Hessian  and  preconditioner  are  indefinite  at  the  standard  starting  point 
(xo)  and  remciin  so  for  many  iterations  right  until  a  convex  region  is  entered;  there  are 
many  negative  and  clustered  eigenvalues  near  Xo;  minimization  becomes  more  difficult  with 
increasing  size;  and,  since  the  Hessian  is  dense,  it  is  natrual  to  consider  preconditioners 
other  than  diagonals.  It  is  worth  pointing  out  that  for  examination  of  the  MCF  in  our 
context  —  indefinite  preconditioners  in  the  inner  CG  loop  —  "standard"  test  problems 
present  a  difficulty  because  natural,  much  less  indefinite,  preconditioners  are  nontrivial 
to  formulate.  Our  experience  has  been  that  diagonal  preconditioners  typically  work  best; 
unfortunately,  diagonal  matrices  are  poor  choices  for  our  study  because  of  their  simplicity 
8uid  the  consequence  that  the  main  feature  of  S.SjE  —  looking  ahead  in  the  factorization 
for  possible  small  diagonal  candidates  —  is  not  tested.  In  future  work,  we  will  examine  the 
issue  of  preconditioners  for  standard  test  problems  in  more  detail  so  that  another  MCF 
comparison  study  can  be  performed. 

Efficiency  of  overall  convergence  in  our  truncated  Newton  method  can  be  determined 
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from  the  total  number  of  Newton  iterations,  PCG  iterations,  and  function  Sz  gradient 
evaluations.  The  total  cost  of  the  method  depends  on  these  sums  as  well  as  the  cost  of 
solving  the  sparse  system  Mz  =  r  in  PCG.  The  cost  for  this  system  —  solved  by  our  sparse 
MCF  version  —  depends  on  the  sparsity  structure  of  M  and  can  be  as  low  as  0(n)  [3]. 
The  number  of  function  &  gradient  evaluations  is  at  leaist  one  per  Newton  iteration,  more 
than  one  if  required  by  the  line  sesirch.  Overall,  for  truncated  Newton  methods,  the  total 
number  of  PCG  iterations  is  generally  a  good  measure  for  performance  comparison.  The 
total  number  of  Newton  iterations  provides  additioned  information  on  overall  progress  since 
many  Newton  iterations  typically  indicate  that  indefiniteness  (of  H)  has  been  detected,  in 
which  case  the  inner  loop  is  terminated  without  satisfaction  of  the  truncation  criterion. 

It  is  important  to  realize  that  the  effect  of  using  different  MCF  procedures  for  indefinite 
preconditioners  in  combination  with  truncation  of  the  inner  loop  leads  to  different  search 
directions.  Note  that  even  if  the  truncation  criterion  becomes  very  strict  (i.e.,  very  small 
6  in  eq.  (16)),  the  inner  loop  may  be  halted  if  H  is  indefinite  with  detection  of  a  negative- 
curvature  direction.  In  this  case,  different  search  directions  can  also  result.  Thus,  the  effect 
of  the  MCF  on  the  overall  minimization  progress  is  indirect  and  cumulative.  Furthermore, 
in  addition  to  affecting  the  total  number  of  Newton  iterations,  PCG  iterations,  and  function 
&c  gradient  evaluations,  different  MCF  strategies  may  produce  different  minima  if  several 
local  minima  exist.  This  is,  in  fact,  our  experience  with  computationzd  chemistry  problems. 

For  laxge-scale  problems,  the  number  of  PCG  iterations  is  often  limited  to  a  number 
•C  n  due  to  the  prohibitive  cost  of  the  inner  iteration.  Thus,  while  the  effect  of  a  par- 
ticular MCF  strategy  in  one  Newton  step  is  interesting,  more  important  from  a  practical 
point  of  view  is  comparing  the  overall  effect  on  minimization  progress  and  performance. 
Nonetheless,  in  the  examples  below,  we  provide  the  total  number  of  PCG  iterations  for 
each  minimization  run  as  a  cumulative  indicator  of  the  first  question  (the  step-by-step 
effect).  In  all  our  examples,  similar  performance  trends  were  also  observed  when  the  trun- 
cation criterion  was  set  to  be  very  strict,  to  force  small  residuzds  for  the  Newton  equations 
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and  thereby  mimic  exact  Newton  methods.  For  the  first  molecular  model,  we  provide  the 
data  for  such  a  run  alongside  the  standard  'truncated'  version. 

All  computations  were  performed  in  double  precision  on  a  DEC  Vax-3600  (3-processor) 
computer  at  the  Courant  Institute  of  Mathematical  Sciences,  New  York  University.  Ma- 
chine precision  is  of  order  10"^^.  For  the  present  work,  we  have  chosen  to  study  problems 
of  moderate  sizes  to  permit  a  thorough  investigation  of  the  MCF  segment  in  minimization, 
including  eigenvalue  analysis  of  H  and  M  at  every  iteration. 

(A)  A  Nucleic  Acid  Component 

In  the  "static"  semi-empirical  approach  of  computational  chemistry,  a  target  potential 
energy  function  is  minimized  to  find  three-dimensioned  structures  of  biomolecules  in  low 
energy  configurations.  Potential  energy  surfaces  are  typically  very  complex,  involving 
many  local  minima,  maxima,  and  transition  points.  Indeed,  this  problem  structure  makes 
the  MCF  component  in  optimization  particularly  important.  Our  main  area  of  interest 
involves  nucleic  acids,  and  the  first  example  described  here  involves  a  component  of  DNA, 
deoxycytidine  (dC),  which  we  have  studied  in  detail  [12-13].  Biochemical  details  of  our 
molecular  mechanics  and  dynamics  package,  MADPAC,  have  been  discussed  elsewhere  [12- 
14].  We  mention  here  only  that,  for  this  model,  potentied  energy  parameters  have  been 
well  tested  and  specific  local  minima  are  known  experimentally. 

Molecular  preconditioners  used  in  the  MCF  segment  of  TNPACK  consist  of  the  local 
chemical  interactions  between  bonds,  bond  angles,  and  dihedral  angles.  Such  contributions 
tend  to  cluster  near  the  diagonal  when  neighboring  atoms  are  numbered  sequentially.  More 
recently,  penalty  terms  for  rigid-body  rotations  and  translations  were  also  added  to  our 
preconditioner  (this  eliminates  corresponding  zero  eigenvalues  at  the  solution).  We  have 
found  this  choice  of  M  to  be  a  convenient  sparse  approximation  to  the  (dense)  Hessiam 
that  works  well  in  practice.  However,  there  is  no  guarantee  that  M  is  positive  definite, 
even  near  a  local  minimum.    Thus,  the  overall  effect  on  minimization  of  different  MCF 
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methods  will  be  particularly  interesting  to  investigate. 

In  Tables  I-IV,  we  show  results  for  the  dC  model  involving  87  variables.  Tables  I  and  II 
show  the  sets  of  results  from  3  starting  points  (xl,  x2,  x3):  in  Table  I  with  the  truncation 
parameter  6  =  1  (eq.  (16)),  and  in  Table  II  with  6  =  l.OE-08.  Tables  III  and  IV  provide 
eigenvalue  information  on  H  and  A/  at  the  starting  points  and  at  3  computed  solution 
points  (x5,  x6,  x7).  In  Tables  I  and  II  we  show  the  toted  number  of  Newton  iterations, 
PCG  iterations,  and  function  &  gradient  evaluations  for  each  run.  Each  run  from  the  xl, 
x2  and  x3  group  converged  to  the  local  minima  x6,  x5,  emd  x7,  respectively.  The  MCF 
information  per  run  is  given  by  the  minimum,  msLximum,  and  median  values  for  ||£'||oo, 
computed  over  all  Newton  iterations.  Additionadly,  the  number  of  Newton  iterations  for 
which  ||£^||oo  >  0  is  indicated. 

For  dC  minimization,  many  starting  conformations  were  investigated,  and  the  3  differ- 
ent choices  shown  here  are  representative  of  different  energies  and  eigenvalue  distributions 
at  the  initial  points.  In  energy  ranking,  F^l)  <  F(x2)  <  F(x3)  with  corresponding 
energy  values  of  1.4,  1.8E-I-03,  and  5.1E-I-04  Kcal/mol.  All  Hessians  and  preconditioners 
are  indefinite  at  the  starting  points.  From  the  values  of  Xmtn,  ^maz,  and  the  number 
of  negative  eigenvalues  (#NEG)  shown  in  Tables  III  and  IV,  we  see  that  while  M  has  1 
negative  eigenvalue  of  magnitude  O(10~^)  and  a  Xmax  of  O(IO^),  the  distribution  of  H's 
eigenvalues  differs  at  the  3  starting  points.  At  xl,  if  has  2  negative  eigenvalues  with  Xmin 
of  magnitude  0(10°)  and  a  Xmaz  of  0(10'');  at  x2,  H  has  7  negative  eigenvalues  with 
Xfnin  of  magnitude  0(10'*)  zuid  Xmaz  of  O(IO^);  and  at  x3,  H  has  9  negative  eigenvalues 
with  Xmin  of  magnitude  O(IO^)  and  Xmaz  of  O(IO^).  As  for  the  minima  (x5,x6,x7), 
both  H  Eind  M  are  positive  definite  with  reasonable  condition  numbers  of  O(10'*-10^).  In 
energies,  F(x5)  <  F(x6)  <  F(x7)  with  values  of  —6.8,-5.7,  and  —5.0  Kcal/mol,  respec- 
tively. These  energy  differences  are  large  in  molecular  terms  and  separate  well  3  different 
geometric  configurations. 

From  Table  I,  in  which  the  truncation  parameter  was  leirge  {6  =  1),  we  first  note 
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that  overall  convergence  to  a  local  minimum  is  very  rapid  with  the  truncated  Newton 
method:  <  n/4  outer  iterations,  <  n  inner  iterations,  and  ~  n/2  function  &  gradient 
evaluations.  Second,  in  the  MCF  segment,  we  note  that  M  is  modified  in  only  1  (the 
first)  ajid  possibly  another  (typically,  the  second)  iteration.  The  values  of  ||£'||oo  are 
larger  for  GMW  than  for  S&:E,  by  about  a  factor  of  2,  but  in  both  strategies  ||£^||oo  is 
greater  than  Xmin  by  1  or  2  orders  of  magnitude.  When  the  MCF  produces  nonzero 
increments  in  only  one  Newton  iteration,  ||£^||oo  corresponds  to  A^^n  of  M  at  the  starting 
point  (shown  in  Table  III).  When  there  are  two  iterations  with  M  indefinite,  the  maximum 
recorded  value  for  ||£'||oo  corresponds  to  \min  of  M  at  the  starting  point,  while  the  recorded 
minimum  ||£'||oo  corresponds  to  a  second  Xmin  which  is  slightly  smaller  in  magnitude  than 
the  first  (Am,„  =  —0.22  vs.  ||.E||oo  =  2.4  in  the  second  iteration  of  S&E  (a)  for  xl,  and 
Xjnin  =  —0.01  vs.  ||£'||oo  =  6.1  in  the  second  iteration  of  S&E  (a)  for  x3).  The  values 
of  7  Emd  C  (eqns.  (6)  and  (7))  during  the  minimization  runs  Eire  typically  around  2000 
and  1000,  respectively.  In  the  first  Newton  iteration  when  H  is  indefinite,  between  1-3 
PCG  iterations  axe  performed  in  zJl  runs.  The  smaller  S<SiE  increments  do  not  produce 
a  systematic  reduction  of  PCG  iterations  in  this  context.  In  all  MCF  segments,  diagonal 
increments  eire  only  made  in  the  last  3-4  steps  (j  >  83). 

The  overall  trend  to  be  noted  here  is  that  despite  the  fact  that  ||£'||oo  is  systematically 
greater  for  GMW  than  for  S&E,  both  strategies  perform  overall  quite  similarly.  The 
differences  axe  small,  but  the  GMW  version  appears  slightly  more  efficient.  Among  the 
two  S&E  versions  —  with  and  without  the  condition  e^  >  ej_i  imposed  —  performance 
differences  axe  also  small.  Close  analysis  of  the  data  shows  that  relaxing  the  restriction 
that  ej  >  Cj-i  can  lead  to  slightly  greater  ||.E||oo  values  per  Newton  iteration  but  that 
Cj  =  0  for  more  diagonal  components.  Thus,  small  differences  in  performance  depend  on 
the  differences  in  effective  preconditioners  (A/  +  E)  of  H .  Since  nonzero  increments  are 
only  involved  in  the  last  few  steps,  overall  performance  between  all  versions  is  very  similar. 

When  the  truncation  parameter  was  made  more  strict  {6  =  l.OE— 08),  we  note  from 
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the  corresponding  set  of  runs  shown  in  Table  II  that  overall  trends  of  the  S&E  and  GMW 
versions  axe  repeated.  This  behavior  verifies  that  the  cumulative  effects  we  reported  above 
reflect  systematic  differences  in  the  MCF  strategies  (per  step)  rather  than  just  different 
resulting  paths.  All  MCF  increment  values  and  trends  are  very  simileir  to  those  found 
in  Table  I.  The  power  of  truncation  can  also  be  clearly  noted:  overall  convergence  is  not 
sacrificed  by  approximating  the  search  directions  in  early  steps;  although  the  number  of 
Newton  iterations  may  be  slightly  less  in  all  Table  II  runs,  the  required  number  of  PCG 
iterations  is  greater  by  a  factor  of  3-4  than  the  corresponding  values  of  Table  I. 

Overall,  this  rapid  convergence  and  insensitivity  of  the  minimization  to  the  MCF 
strategy  may  be  attributed  to  the  fact  that  this  optimization  problem  is  straightforward, 
with  distinct  convex  conformational  regions.  Indeed,  in  almost  all  runs,  H  becomes  positive 
definite  aifter  the  first  Newton  iteration  and  remains  so.  Moreover,  M  appears  to  be  a  good 
preconditioner  for  this  problem,  and  it  is  positive  definite  except  during  1  or  2  Newton 
iterations.  More  important  for  the  MCF  details,  near  Xq  M  has  only  1  isolated  negative 
eigenvalue;  all  MCF  strategies  modify  only  a  few  diagonals  and  are  all  successful  at  "fixing" 
the  problem  efficiently  and  rapidly. 

As  for  the  magnitude  of  ||i^||oo,  these  runs  further  support  the  result  that  ||jE||oo 
may  be  smeiller  in  practice  for  SSzE  than  for  GMW.  However,  they  also  suggest  that  this 
factor  alone  does  not  imply  better  performance  in  the  context  of  optimization.  In  fact, 
leirger  modifications  to  M  may  result  with  GMW  but  may  also  produce  very  good  descent 
directions,  leading  to  more  efficient  minimization  progress. 

(B)  Water  Clusters 

Our  second  model  of  water  clusters  differs  from  the  dC  model  in  two  major  respects. 
First,  while  there  are  distinct  energy  minima  for  dC  that  are  well  separated  by  high- 
energy  regions,  liquid-water  clusters  possess  many  nearby  local  minima  that  are  close  in 
energy  [14].   This  results  from  the  fact  that  the  clusters  are  stabilized  by  a  random  and 
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rich  intermoleculax  network  of  hydrogen  bonds.  As  the  cluster  size  grows,  the  number  of 
possible  hydrogen-bonded  configurations  increases  enormously.  Second,  as  a  result  of  this 
difference  in  potentiaJ-energy  surface,  the  Hessian  for  water,  unlike  dC,  is  often  indefinite. 
In  most  of  our  runs,  the  inner  loop  terminates  when  a  direction  of  negative  curvature  is 
detected;  this  occurs  for  majiy  Newton  iterations  until  the  last  few,  when  some  convex 
region  is  entered.  This  behavior  typically  results  in  a  large  number  of  inner  and  outer 
iterations  as  well  as  function  Sz  gradient  evaluations  (the  latter,  presumably  to  reduce  the 
energies  sufficiently  when  poor  sejirch  directions  are  produced).  Furthermore,  our  "local 
potential"  preconditioner  for  water  does  not  contain  the  major  energy  contributions  as  in 
the  dC  model.  The  non-local  terms,  in  particular  the  electrostatic  potential  which  leads 
to  appropriate  formation  of  hydrogen  bonds,  is  clearly  domineint.  Thus,  the  MCF  often 
produces  nonzero  modifications  at  every  Newton  iteration.  Since  overall  performance  of 
the  truncated  Newton  method  depends  on  the  convexity  of  the  problem  and  on  the  choice 
of  M,  finding  optimal  cluster  configurations  is  a  much  more  difficult  problem  than  finding 
the  minima  of  dC.  The  added  difficulty  of  choosing  starting  points  compounds  the  problem. 

Results  for  a  water  dimer  model  (2  water  molecules,  3  atoms  per  molecule,  n  =  18) 
are  summarized  in  Table  V.  There  is  only  one  minimum  for  the  dimer,  corresponding 
to  a  linear  hydrogen- bonded  pair.  In  general,  convergence  requires  around  40  Newton 
iterations.  Run  A  used  our  local  M.  For  this  run,  GMW  is  more  efficient  in  terms  of  total 
Newton  and  PCG  iterations  as  well  as  function  Sz  gradient  evaluations.  The  differences  in 
the  ||£^||oo  values  axe  interesting.  GMW  produces  very  small  diagonal  elements  at  the  end 
of  the  factorization  —  O(10~^^-10~'^°)  —  and  consequently  leads  to  increments  in  the  order 
of  the  prescribed  parameter  r  (see  eq.  (9)).  The  S&E  versions,  on  the  other  hand,  switch 
to  Phase  2  earlier  in  the  factorization,  thereby  leading  to  more  diagonal  modifications; 
nevertheless,  this  avoids  formation  of  very  small  diagonal  entries.  To  illustrate,  we  show 
in  Table  VI-(A)  the  diagonal  modifications  for  all  three  MCF  versions  in  the  first  Newton 
iteration.  Similar  behavior  occurs  in  all  other  Newton  iterations. 
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To  exaxnine  these  emerging  trends  more  closely,  we  rein  the  same  minimization  problem 
with  a  different  choice  of  M:  some  of  the  nonlocal  terms  (intramolecular  oxygen-oxygen 
interactions)  were  added  to  M  of  run  A.  Essentially,  we  added  a  small  off-diagonal  (3x3) 
block  to  the  previous  block-diagonaJ  M  made  of  two  9x9  blocks.  Results,  summarized  in 
Table  V  under  run  B,  show  that  overall  performance  of  all  three  MCF  versions  is  now  very 
similar.  However,  the  mean  ||^||oo  from  GMW  is  greater  thaxi  that  of  the  S«^E  versions 
by  about  1  order  of  magnitude.  As  in  run  A,  GMW  begins  diagonal  increments  later  than 
S&E,  ajid  this  leads  to  larger  increments.  This  behavior  is  illustrated  in  Table  VI-(B). 

A  systematic  investigation  was  then  performed  for  water  clusters  of  increasing  size. 
Starting  points  were  chosen  by  positioning  the  molecules  within  a  computational  box,  spec- 
ified by  the  number  of  molecules  along  the  x,  y,  and  z  directions:  each  molecular  oxygen  is 
positioned  at  a  center  of  a  box,  with  its  hydrogens  lying  on  a  unit  sphere  around  it  (their 
positions  are  chosen  pseudorandomly),  and  each  molecule  is  separated  from  its  neighbors 
by  some  chosen  distance  [14].  Such  configurations  axe  typically  high  in  energy  and  very 
far  from  optimal  structures;  in  particular,  large  clusters  adopt  a  spherical,  rather  than 
rectemgulax  arrangement,  so  large  structural  distortions  axe  involved  during  minimization. 
In  addition,  different  cluster  sizes  possess  varying  numbers  of  geometric  arrangements,  so 
while  a  3x3x3  moleculeir  cluster  (27  molecules,  n  =  243)  adopts  an  overall  spherical 
arrangement,  a  2x5x2  (n  =  90)  cluster  can  adopt  a  laxge  number  of  different  shapes. 
This  behavior,  in  combination  with  the  random  2ispect  of  selecting  starting  points,  leads 
to  differences  in  minimization  performance  that  are  not  always  correlated  monotonically  to 
the  cluster  size.  The  limiting  number  of  PCG  iterations  (per  Newton  iteration)  was  set  to 
n  in  cdl  runs  to  malce  the  MCF  comparison  as  fair  as  possible.  Our  truncation  parameter 
6  was  set  to  unity. 

Table  VII  compares  performance  among  our  MCF  versions  for  water  clusters  of  di- 
mensions n  =  27  to  n  =  576,  and  Table  VIII  provides  eigenvalue  information  for  the 
corresponding  starting  and  final  points  and  for  the  water  dimer.  In  the  runs  we  report  for 
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each  dimension,  the  resulting  minima  are  very  close  in  energy  although  they  may  not  be 
identical.  In  such  c£ises,  corresponding  eigenvalue  information  at  the  final  points  is  very 
similar  for  each  group,  and  the  values  we  report  in  Table  VIII  are  representative. 

In  compeirison  with  results  for  the  water  dimer  discussed  above,  similar  behavior  in  the 
Halloo  size  can  be  observed  for  larger  water  models,  but  a  new  overall  trend  emerges  among 
the  MCF  versions  with  increasing  size.  We  note  from  Table  VII  that  the  GMW  version  is 
more  efficient  for  smaller  dimensions.  However,  as  the  size  incre2ises,  the  S&E  strategy  —  in 
particulax  the  S&E(b)  variation  —  becomes  more  efficient  overall.  In  particular,  differences 
in  total  number  of  PCG  iterations  between  the  S&E  and  GMW  versions  become  large  in 
many  cases. 

Close  analysis  of  the  data  reveals  the  following  explanation.  The  Hessian  is  indefinite 
at  the  starting  point  and  remains  so  for  many  iterations;  thus,  the  inner  loop  often  ends 
with  detection  of  negative  curvature.  M  is  also  indefinite,  with  approximately  n/9  negative 
eigenvalues  (all  very  close  to  zero)  throughout  the  run,  forcing  nonzero  modifications  at 
every  step  of  minimization.  Now,  the  GMW  version  produces  ||£^||r)o  sizes  on  the  order 
of  the  set  tolerzmce  r  (O(10~^)),  while  the  Gerschgorin  estimates  of  S&:E  lead  to  much 
larger  modifications  (O(IO')).  Indeed,  behavior  in  lairger  water  clusters  is  very  similar  to 
the  situation  shown  in  Table  VI-(A)  for  the  water  dimer.  Thus,  the  resulting  GMW  search 
directions  often  produce  very  slow  progress  for  mjiny  iterations,  and  the  number  of  H's 
negative  eigenvalues  decreases  slowly;  when  H  becomes  positive  definite,  the  truncation 
criterion  is  stricter  (since  k  is  larger,  see  eq.  (16))  and  many  PCG  iterations  are  then 
involved.  In  fact,  the  last  few  iterations  of  GMW  for  large  dimensions  often  end  when  the 
maximum  number  of  allowed  PCG  iterations  has  been  reached.  The  S<Sj:E  strategy,  on  the 
other  hand,  succeeds  in  entering  positive  definite  regions  of  H  more  quickly,  and  progress 
is  typically  rapid  therejifter.  The  inner  loop  then  terminates  successfully  with  satisfaction 
of  the  permitted  residual  (rather  than  excess  of  PCG  iterations),  although  the  number  of 
PCG  iterations  involved  may  be  leu-ge.  Version  (b)  of  S&E  is  more  competitive  than  version 
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(a)  because  diagoned  modifications  are  made  less  often,  so  a  smaller  number  of  total  PCG 
iterations  typicedly  results.  When  we  performed  the  same  runs  with  a  stricter  truncation 
criterion  (as  in  the  dC  example),  results  were  very  similar  to  those  reported  above  since  a 
stricter  criterion  affects  only  iterations  that  do  not  end  in  detection  of  negative  curvature, 
and  these  iterations  only  occur  at  the  end,  where  already  the  Newton  equations  are  solved 
more  accurately. 

Why  is  the  GMW  version  competitive  for  smaller  dimensions?  The  number  of  negative 
eigenvalues  seems  to  play  a  key  role  in  steering  the  minimization  progress.  Note  from 
Table  VIII  that  H  has  about  n/3  negative  eigenvalues  at  the  starting  point,  with  a  Xmin  of 
magnitude  0(10° -10^)  and  a  A^ar  of  O(IO^).  The  number  of  A/'s  eigenvalues  is  smadler  and 
about  n/9,  but  they  axe  all  very  small  in  magnitude,  clustered,  and  persistent.  In  fact,  the 
existence  of  zero  eigenvalues  for  M  results  from  the  interdependence  of  the  potential  energy 
derivatives  in  the  x,  y,  and  z  directions  for  terms  associated  with  each  water  molecule. 
Thus,  when  only  a  few  diagonal  candidates  are  small  (say,  1-3),  GMW  is  more  efficient 
because  the  minimal  amount  of  modification  is  sufficient  and  does  not  affect  the  computed 
seau-ch  direction  dramatically.  However,  as  the  number  of  small  diagonals  increases,  slow 
progress  results  from  many  more  resulting  diagonals  of  magnitude  r,  10~*.  The  S^E 
strategy  detects  indefiniteness  early  in  the  factorization  (Phase  1  is  typically  false  very 
early  in  the  Cholesky  factorization,  sifter  about  10  steps)  and  modifies  the  diagonals  more 
substantially.  This  preventive  strategy  "pays  off"  in  overall  performance  as  the  size  and 
complexity  of  the  problem  increase. 

These  results  suggest  that  resetting  very  small  prescribed  tolerance  psirajneters  can 
slow  minimization  progress  considerably.  This  situation  can  occur  for  problems  with  many 
neax-zero  eigenvalues,  a  typical  one  in  computational  chemistry  because  of  translation  and 
rotation  invariance  as  well  as  dependence  of  the  potential-energy  derivative  components. 
They  further  suggest  that  the  S&E  strategy  may  be  very  effective  for  handling  such  complex 
large-scale  problems. 
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(C)  The  Trigonometric  Function 

To  examine  the  preventive  S&E  strategy  and  resulting  modifications  further,  we  focus 
on  a  different  problem  structure  where  majiy  negative  eigenvalues  of  larger  magnitudes  are 
present.  The  trigonometric  function  of  variable  dimension  n  is  given  as  [15]: 

F(x)  =      y^      (  n  —  N^cosii  +  j(l  —  cosxj)  —  sinxj  J     .  (19) 

>  =  l,...,n    ^  :=1  ' 

The  recommended  starting  point  is  Xo  =  (1/n, . . . ,  1/n)-^,  at  which  H  is  indefinite.  We 
study  4  sets  of  minimization  runs  from  Xo,  for  n  =  50,  100,  250,  and  1000,  and  for  each 
set  we  use  diagonal,  tridiagonal,  and  pentadiagonal  preconditioners,  constructed  from  the 
Hessian.  We  set  our  truncation  parameter  6  to  unity  and  the  limiting  number  of  PCG 
iterations  (per  Newton  step)  to  10.  This  limit  is  imposed  because  H  is  dense  so  the 
inner  iterations  axe  costly.  Typically,  the  residual  is  very  small  Jifter  a  small  number 
of  PCG  iterations,  and  this  limit  has  only  a  small  effect,  if  amy,  on  the  last  1-2  Newton 
iterations.  Additionally,  since  the  residued  becomes  very  small  quickly,  a  stricter  truncation 
criterion  has  little  effect  on  performance.  For  all  runs,  we  use  a  stricter  overall  minimization 
termination  criterion  of  ||g||/\/n  <  l.OE— 12  *  max{l,  ||x||}  ,  since  ||g(xo)||  is  small  and 
we  wajited  to  ensure  that  a  region  of  quadratic  convergence  is  well  entered.  When  such 
a  region  is  entered,  the  gradient  is  reduced  very  rapidly,  so  this  strict  condition  does  not 
present  a  problem. 

In  Table  IX  we  show  results  for  the  4  sets  of  runs,  axid  in  Table  X  we  list  the  eigenvalues 
of  H  cind  the  pentadiagonaJ  M  at  both  Xo  axid  x.  (solution).  For  the  other  choices  of  M  — 
diagonal  ajid  tridiagonaJ  —  only  small  changes  for  Xmin  were  noted  (  ~  0.01  -  0.05).  The 
typical  vsdues  of  7  and  (  for  this  function  are  7  ~  2.0  and  ^  in  the  range  O(10~^-10~^). 
From  the  eigenvalue  table  we  note  immediately  that  at  Xo  both  H  and  M  have  60%  of 
their  eigenvedues  negative.  Moreover,  our  eigenvalue  analysis  shows  that  all  eigenvalues 
have  magnitudes  O(10~^-10~^),  most  of  them  of  O(10~M-  For  example,  for  n  =  100. 
the  61  negative  eigenvalues  of  the  pentadiagonal  M  occur  in  increeusing  order  of  {  —  0.591, 
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-0.591,  -0.584,  -0.584,  -0.577,  -0.577,  -0.571,  -0.571,  -0.570,  -0.565,  -0.561,  -0.559, 
-0.556,  -0.552,  . . .,  -0.140,  -0.112,  -0.069,  -0.045,  -0.019}.  At  Xq,  Xmax  is  about  1.5 
for  H  ajid  M,  so  the  eigenvalues  axe  concentrated  in  the  [  —1.0,1.5  ]  interval.  At  x,,  H 
and  M  are  positive  definite,  with  Xmm  of  O(10~^)  and  \max  of  about  3.0  for  H  and  2.0 
for  M. 

From  the  runs  with  diagonal  M  we  note  from  Table  IX  that  the  GMW  strategy 
is  the  most  effective  for  each  dimension.  The  required  number  of  Newton  iterations  is 
considerably  less  than  for  the  S<S;E  runs  (except  for  n  =  50,  where  it  is  the  same),  and 
the  total  number  of  PCG  iterations  is  smaller.  The  number  of  iterations  involving  nonzero 
modifications  to  M  is  also  smaller  for  GMW  for  n  >  50  (equal  to  S^E  for  n  =  50). 
Note  that  overedl  superiority  of  GMW  occurs  even  though  the  size  of  ||£'||oo  is  typically 
greater  than  that  of  S&E,  by  a  factor  of  2.  This  relation  is  expected  from  the  modification 
formulas  (eqs.  (9)  and  (13))  since  ^  =  0  and  r  is  greater  than  the  negative  Ccindidate 
diagonal.  Thus,  while  GMW  generally  modifies  negative  diagonals  rrijj  to  \mjj\  ,  S&E 
produces  elements  of  magnitude  r.  These  small  diagonals  produce  search  directions  that 
lead  to  slower  progress  during  minimization  for  S&E  than  for  GMW  (further  discussion  is 
given  below). 

We  note  an  interesting  trend  when  ofF-diagoneil  elements  are  added  to  M.  In  short,  the 
relative  performance  of  GMW  is  now  slowed  —  more  Newton  iterations,  PCG  iterations, 
and  nonzero  modifications  —  while  performance  of  S&E  is  greatly  improved.  In  particular, 
£LS  the  size  of  the  problem  increases,  version  S&E(a)  becomes  superior.  Thus,  the  preventive 
strategy  of  S&E  works  very  well  for  problems  with  this  eigenvalue  structure,  as  it  is  very 
effective  at  getting  out  of  indefinite  regions  more  quickly.  The  small  number  of  Newton 
iterations,  especially  for  S&:E(a),  results  since  H  and  M  become  positive  definite  quickly, 
and  rapid  progress  is  made  thereafter. 

This  difference  in  progress  can  be  seen  from  Figure  2,  which  shows  Xmin  vs.  Halloo 
for  the  3  MCF  versions  for  the  run  with  n    =    250  with  tridiagonal  M.  During  the  first  5 
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Newton  iterations,  the  number  of  negative  eigenvalues  of  M  in  the  GMW  version  is  reduced 
slowly  from  152;  the  GMW  sequence  of  negative  eigenvalues  per  Newton  iteration  starting 
from  iteration  0  is  {152, 135, 102,44,55,31},  while  the  corresponding  S&:E(a)  and  S<L'E(b) 
sequences  are  {152,5,2,0,0,0}  and  {152,5,4,0,0,0},  respectively.  At  later  iterations,  the 
number  of  negative  eigenvalues  in  the  GMW  run  ranges  from  0  to  65,  while  the  range  for  the 
S&E  versions  is  between  0  and  8.  An  examination  of  the  data  shows  that  the  first  Newton 
iteration  is  quite  similar  for  all  MCF  versions  in  the  modification  schedule:  GMW  modifies 
diagonals  99  <  j  <  250,  emd  S&E  enters  Phase  2  at  j  =  97.  The  GMW  and  S&E  versions 
perform  2  and  1  PCG  iterations,  respectively,  at  this  first  Newton  iteration  until  a  direction 
of  negative  curvature  is  detected.  Now,  at  the  second  Newton  iteration,  GMW  modifies 
negative  diagonal  candidates  in  the  range  96  <  j  <  250,  of  magnitude  range  O(10~^- 
10"''),  while  both  S&:E  versions  enter  Phase  2  at  j  =  153  and  modify  few  small  negative 
diagonals  and  small  positive  diagonal  candidates  of  O(10~'^-10~'*);  consequently,  future 
diagonals  do  not  become  negative.  This  trend  between  the  3  MCF  versions  repeats  for 
several  Newton  iterations,  involving  less  and  less  diagonal  modifications  at  each  iteration, 
emd  progress  is  particularly  rapid  for  S&E(a).  Version  S&E(b)  is  slower  than  S<SjE(a) 
because  many  resulting  increments  are  of  order  r  when  the  condition  Cj  >  Cj-i  is  omitted, 
and  smaller  diagoneil  elements  appear  in  later  Cholesky  steps. 

Thus,  the  tendency  of  small  diagonal  candidates  to  form  is  reversed  between  the  GMW 
and  S&E  versions  for  our  diagonal  and  non-diagonal  M  runs.  Note,  however,  that  this 
tendency  is  overcome  in  each  version  by  a  different  strategy  —  lairger  GMW  increments 
in  the  diagonal  case,  and  the  S&E  preventive  schedule  in  the  non-diagonal  cases.  The 
outcomes  in  each  Ccise  explain  our  noted  diff"erences  in  performance  among  the  3  MCF 
strategies  for  the  trigonometric  function.  Relating  this  to  our  previous  examples,  we  recall 
that  for  the  water  clusters,  a  systematic  trend  of  small,  positive  diagonals  of  order  r  also 
slowed  overall  minimization  progress.  Together,  these  results  further  support  the  great 
effectiveness  of  the  S&E  strategy  for  general  indefinite  matrices  with  negative  eigenvalues 
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that  have  magnitudes  greater  than  r.  The  clustering  of  eigenvalues  also  seems  to  make 
the  S&E  strategy  more  effective,  because  the  preventive  modification  schedule  improves 
performance  later  on.  Interestingly,  it  has  already  been  noted  that  Si:E  produces  the 
most  significant  improvement  in  ||£'||oo  over  GMW  for  problems  where  the  eigenvalues  are 
clustered  between  —1  and  1  [1]. 

VI.  SUMMARY 

We  have  discussed  the  implementation  of  the  new  modified  Cholesky  factorization 
(MCF)  of  Schnabel  &z  Eskow  (S&E)  in  the  context  of  Wge-scale  nonlinear  optimization. 
A  version  of  the  factorization  was  implemented  in  our  optimization  program  TNPACK,  a 
large-scale  truncated-Newton  package.  The  MCF  is  used  in  the  Preconditioned  Conjugate 
Gradient  component  (PCG)  to  solve  a  linear  system  involving  the  preconditioning  matrix 
of  the  Newton  equations.  The  preconditioner  is  often  chosen  from  the  physics  of  the 
problem  but  is  not  guziranteed  to  be  positive  definite.  Both  the  S&E  and  the  Gill,  Murray, 
and  Wright  (GMW)  factorizations  were  incorporated  into  TNPACK,  and  performance 
comparisons  were  made  for  sample  problems  of  different  eigenvgJue  distributions. 

Two  areas  have  been  discussed  in  this  paper:  implementation  in  large-scale  opti- 
mization, and  performance  results.  The  no-pivoting  version  of  S<S^E  is  used  because  large 
systems,  when  subjected  to  direct  factorizations,  are  typically  sparse.  Thus,  reordering 
is  often  done  a  priori.  Moreover,  in  mamy  physical  applications,  sparsity  patterns  are 
structured  and  pivoting  is  not  done  at  all,  since  it  may  destroy  optimal  patterns. 

Implementation  of  S&E  differs  from  GMW  both  in  the  form  of  the  recipe  that  deter- 
mines the  diagonal  increments  and  in  the  manner  by  which  the  recif>e  is  activated.  While 
GMW  subjects  all  candidate  diagonals  to  the  same  modification  formula,  S&E  employs  a 
two-phase  strategy.  Phase  1  denotes  a  state  when  all  diagonal  increments  axe  zero  (posi- 
tive definite  matrices  are  always  in  Phase  1),  and  Phase  2  is  entered  when  any  prospective 
diagonal  in  the  remaining  submatrix  is  small.   Only  at  Phase  2  is  the  Gerschgorin-based 
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recipe  of  S&E  applied.  Whereas  the  phase  test  in  S&E  requires  that  the  diagonal  elements 
of  the  matrix  be  updated  at  every  step,  this  is  not  a  requirement  in  GMW.  For  the  actual 
recipe  of  S<SjE,  we  suggest  two  variations  that  may  work  better  in  certain  optimization 
applications:  1)  Cj  is  not  required  to  be  as  large  as  e^-i,  and  2)  the  same  strategy  is  used 
for  cdl  the  e/s  (instead  of  using  actual  eigenvalues  in  the  last  two  steps).  We  used  the 
same  strategy  for  the  e^'s  throughout  this  work,  and  we  compared  two  S<SjE  versions  where 
ej  >  Cj-i  was  enforced  (S<SiE,a)  and  where  it  was  not  (S<Sj:E,b). 

In  the  numerical  experiments,  we  focused  on  the  overall  efficiency  and  performance 
of  minimization  when  different  MCF  strategies  are  used  to  produce  positive  definite  pre- 
conditioners.  Clearly,  the  overall  performance  is  the  important  aspect  in  practice,  and  it 
is  certainly  of  major  importance  when  the  number  of  PCG  iterations  is  limited  to  a  small 
number  due  to  computational  cost  of  the  problem.  Although  the  effect  of  different  MCF 
methods  is  cumulative,  systematic  trends  have  emerged  in  this  work.  In  this  respect,  the 
truncated  Newton  formulation  is  a  particularly  interesting  vehicle  to  test  these  different 
MCF  strategies  because  the  cumulative  effects  are  easier  to  observe.  An  examination  of 
analogous  runs  where  the  truncation  criterion  was  set  to  be  more  strict  exhibited  the  same 
overall  performance  trends. 

The  three  problems  we  exeimined  in  detail,  by  varying  size,  preconditioners,  starting 
points,  and  truncation  parameters,  involve  three  different  eigenvalue  distributions  that  are 
typically  encountered  in  practice:  an  isolated  negative  eigenvalue,  severed  very  small-in- 
magnitude  (near  zero)  positive  and  negative  eigenvalues,  and  a  large  percentage  of  negative 
eigenvaiues.  In  the  first  case,  represented  by  a  molecular  model  with  distinct  and  well- 
separated  conformationcd  minima,  nonzero  diagonal  modifications  to  M  were  required  in 
only  1  or  2  Newton  iterations.  The  GMW  and  S.S>:E  factorizations  behaved  very  similarly, 
though  GMW  was  slightly  more  efficient  despite  the  fact  that  ||£'||oo  was  typically  higher 
by  a  factor  of  2.  In  the  second  case,  a  molecular-cluster  model  possessing  numerous  and 
clustered  minima  and  a  characteristically  indefinite  potential  surface,  nonzero  modifica- 
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tions  were  made  in  edl  iterations.  Now  GMW  was  more  efficient  for  smaller  dimensions  but 
S&E(b)  became  much  better  as  size  increased;  ||£^||oo  was  typically  around  the  set  tolerance 
r,  0(10-^),  for  GMW  and  of  0(10^-10^)  for  the  S&E  versions.  In  the  third  case  of  the 
trigonometric  function,  nonzero  modifications  were  involved  at  some  Newton  iterations. 
GMW  performed  better  for  diagoned  matrices  of  various  dimensions,  but  S&E(a)  became 
superior  as  off-diagoneds  were  added.  ||.E||oo  was  typically  higher  for  GMW  by  a  factor  of 
2  or  more.  Our  results  have  shown  that  the  relative  performance  between  the  GMW  and 
S&E  factorizations  depends  on  such  different  problem  characteristics.  Our  observations 
can  be  summarized  a.s  follows: 

(1)  S&E  may  indeed  lead  to  smcdler  values  of  ||£'||oo  than  GMW,  as  suggested  by 
theoretical  estimates.  Nonetheless,  the  size  of  ||.E||oo  may  not  be  the  crucial  factor  in 
the  context  of  optimization;  the  modification  schedule  and  its  consequences  are  also  very 
important  for  maintaining  numerical  stability  and  directing  progress.  Thus,  GMW  can  be 
more  efficient  in  some  cases,  even  when  ||£'||oo  is  larger. 

(2)  The  two-phase  strategy  of  S&E  cleaxly  leads  to  earlier  modifications  of  the  diagonal 
elements  than  GMW.  While  causing  more  perturbations  to  the  original  matrix,  the  SSzE 
phase-switch  strategy  nonetheless  avoids  formation  of  more  negative  elements  that  tend  to 
appear  with  GMW  later  in  the  factorization.  This  strategy  can  be  very  effective  at  getting 
out  of  indefinite  regions,  especially  in  cases  when  there  are  many  near- zero  eigenvalues. 

(3)  The  two  variations  we  suggested  for  the  S&E  increment  strategy  work  well  when 
applied  to  laxge,  sparse  linear  systems  in  the  context  of  optimization.  The  special  formula 
for  e„  and  Cn-i  may  be  unnatural  for  problems  with  inherent  symmetry,  and  implementing 
the  requirement  that  Ej  >  €j-\  can  lead  to  unnecessary  perturbations  of  the  original 
matrix.  However,  this  requirement  may  be  more  effective  for  problems  with  many  negative 
eigenvalues  as  a  means  of  accelerating  progress. 

It  appejirs  that  the  basic  question  with  the  S&E  factorization  versus  GMW  in  the 
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context  of  optimization  is:  does  the  preventive  strategy  of  S&E  "pay  off"?  For  problems 
with  very  few  and  isolated  negative  eigenvalues,  the  GMW  strategy  seems  to  be  more 
effective:  it  remedies  the  problem  only  as  necessary.  The  SSzE  strategy,  on  the  other  hand, 
may  rely  on  large  Gerschgorin  estimates  and  thus  may  lead  to  unnecessary  modifications 
which  do  not  accelerate  overall  progress.  However,  for  problems  characterized  by  highly 
indefinite  regions,  especially  with  clustered  negative  eigenvalues,  the  S&zE  strategy  is  very 
effective.  The  probability  of  encountering  many  negative  diagonals  is  high,  so  the  schedule 
and  size  of  the  modifications  are  critical.  In  particular,  a.s  the  size  and  complexity  of  the 
problem  increases,  the  relative  superiority  of  the  S^E  increases  because  by  remedying  the 
problem  eaxlier,  Si:E  leads  to  better  numerical  behavior  during  the  factorization  (i.e.,  no 
overflow  /  underflow ) . 

These  results  suggest  the  following  classification  of  cases  where  the  two  methods  will 
be  effective.  The  GMW  technique  may  be  more  effective  for  minimization  of  functions  that 
are  well  approximated  by  local  quadratic  models  in  the  region  of  interest.  In  this  case,  H 
—  and  M  in  our  truncated  Newton  context  —  are  likely  to  be  close  approximations  to  a 
positive  definite  matrix.  The  S&E  strategy  may  work  better  for  large,  indefinite  systems, 
especially  with  majiy  clustered  negative  eigenvalues,  where  diagonal  candidates  tend  to 
grow  more  negative  if  modifications  begin  too  late.  This  may  occur  in  minimization  for 
starting  points  that  axe  far  away  from  local  minima;  for  functions  that  possess  many  local 
minima,  as  in  computational  chemistry  applications;  and,  in  general,  for  highly  nonlinear 
objective  functions.  Indeed,  the  locsd  convexity  of  the  model  has  been  suggested  as  an 
important  factor  in  evaluating  performance  of  the  truncated  Newton  method  in  relation  to 
other  large-scale  optimization  methods  [17].  Our  results  suggest  that  incorporation  of  the 
S&E  modified  Cholesky  factorization  in  minimization  of  highly  nonlinear  functions  may 
greatly  improve  the  performance  and  competitiveness  of  Newton  methods. 
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Figure  1:  Step  j  of  the  Modified  C/wlesky  Factorization 


-  Before  step  j  begins 


-  After  step;  ends 


contains  final  L  and  D  factors 


ISi  contains  the  auxiliary  quantities  of  LD 


0  contains  updated  diagonal  elements 


Figure  2:     ?Lmin  vs.  \\E  \  \^for    the  Trigonometric  Function 
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mcf. tables 


Table  I 


Molecular  Model  1:  Deoxycytidine  (dC) , 
(<f  =  1.0  in  truncation  criterion) 


87 


Run 

1   Newton 
1    Itns. 

PCG 

Itns. 

F&G 
Evals . 

Min. 

1  IE  1  loo 
Max. 

Nonzero 
Med.     Itns. 

(XI) 

GMW 
S&E, 
S&E, 

(a)  1 

(b)  1 

1    15 
1    17 
1    16 

71 
84 
81 

38 
43 

41 

1.3E+01 
2.4E+00 
5.6E+00 

1.3E+01 
5.6E+00 
5.6E+00 

1.3E+01 
4.0E+00 
5.6E+00 

1 
2 

1 

(X2) 

GMW 
S&E, 
S&E, 

(a)  1 

(b)  1 

1    19 
1    21 
1    19 

79 
79 
86 

48 
50 
46 

1.4E+01 
7.2E+00 
7.2E+00 

1.4E+01 
7.2E+00 
7.2E+00 

1.4E+01 
7.2E+00 
7.2E+00 

1 
1 
1 

(X3) 

GMW 
S&E, 
S&E, 

(a)  1 

(b)  1 

1    13 
1    13 
1    14 

48 
49 
57 

28 
31 
27 

1.3E+01 
6.1E+00 
7.2E+00 

1.3E+01 
7.2E+00 
7.2E+00 

1.3E+01 
6.7E+00 
7.2E+00 

1 
2 

1 

Table  II 

Molecular  Model  1:  Deoxycytidine  (dC) ,  n  =  87 
(rf  =  l.OE-08  in  truncation  criterion) 


(XI) 


(X2) 


(X3) 


Run 

GMW 

S&E, (a) 
S&E, (b) 


GMW 

S&E, (a) 
S&E, (b) 


GMW 

S&E, (a) 
S&E, (b) 


Newton  PCG    F&G 
Itns.  Itns.  Evals, 


Min. 


MEM, 
Max. 


Med. 


1.3E+01  1.3E+01  1.3E+01 
5.6E+00  5.6E+00  5.6E+00 
5.6E+00   5.6E+00   5.6E+00 


Nonzero 
Itns. 

1 
1 
1 


1.4E+01  6.6E+01  4.0E+01  2 
7.2E+00  7.2E+00  7.2E+00  1 
7.2E+00   7.2E+00   7.2E+00      1 


1.3E+01  1.3E+01  1.3E+01  1 
4.7E+00  7.2E+00  6.0E+00  2 
7.2E+00   7.2E+00   7.2E+00      1 


Schliclc  - 


mcf .tables 


Table  III 
Eigenvalue  Information  for  dC  Starting  Points 


Point 


(XI) 


(X2) 


(X3) 


Amin 

Amax 

#NEG 

============= 

====== 

-1.4E+00 

2.9E+03 

2 

-3.2E-01 

2.8E+03 

1 

-1.2E+04 

1.8E+05 

7 

-1.2E-01 

2.9E+03 

1 

-8.2E+05 

l.lE+07 

9 

-1.2E-01 

2.8E+03 

1 

F(X1)  =  1.4E+00 
F(X2)  =  1.8E+03 
F(X3)  =  5.1E+04 


Table  IV 
Eigenvalue  Information  for  dC  Minima 


(X5) 


(X6) 


{X7) 


Amin 

Amax 

#NEG 

============ 

=========== 

7.7E-02 

3.0E+03 

0 

3.7E-01 

3.0E+03 

0 

3.8E-02 

2.9E+03 

0 

2.6E-01 

2.9E+03 

0 

2.1E-02 

3.1E+03 

0 

3.8E-01 

3.1E+03 

0 

F(X5)  =  -6.8E+00 
F{X6)  =  -5.7E+00 
F(X7)  =  -5.0E+00 


-  T.  Schlick  - 


mcf .tables 


Table  V 
Molecular  Model  2:  Water  Dimer,  n  =  1? 


Run 

1   Newton  PCG 
1    Itns.  Itns 

F&G 
.  Evals . 

Min. 

1  lEI  |<^ 
Max. 

Nonzero 
Med.     Itns. 

(A) 

GMW 
S&E, 
S&E, 

(a)  1 

(b)  1 

1    38 
1    47 
1    43 

244 
322 
296 

100 
132 
120 

1.5E-05 
9.5E-01 
9.5E-01 

1.6E-05 
l.lE+03 
l.lE+03 

1.6E-05 
5.5E+02 
5.5E+02 

38 
47 
43 

(B) 

GMW 
S&E, 
S&E, 

(a)  1 

(b)  1 

1    46 

1    45 
1    45 

294 
297 
297 

129 
127 
134 

l.OE+01 
6.0E+00 
1.5E'+01 

6.7E+03 
9.2E+02 
9.2E+02 

3.4E+03 
4.6E+02 
4.7E+02 

46 
45 
45 

Table  VI- (A) 
MCF  Analysis  for  the  First  Newton  Iteration  of  Table  V,  Run  A 


1        GMW 

S&E, 

(a) 

S&E, 

(b) 

j    1 

1    d. 

.       0 

d.+e. 

d. 
j_ 

350 

d.+e. 
J. !_ 

1452 

d. 
i 

350 

d.+e. 

10    1 

1452 

11   1 

1102 

2105 

1102 

2005 

12    1 

1  -2.1E-13 

1.5E-05 

266 

1368 

266 

837 

13    1 

1   3.6E-15 

1.5E-05 

86 

1189 

77 

230 

14    1 

1   1.6E+02 

0 

349 

1451 

288 

288 

15    1 

1  -5.0E-14 

1.5E-05 

71 

1173 

31 

62 

16    1 

1   3.6E-15 

1.5E-05 

86 

1188 

72 

201 

17    1 

1  -3.6E-15 

1.5E-05 

341 

1444 

209 

209 

18    1 

1  -5.0E-14 

1.5E-05 

70 

1172 

13 

13 

I  lEI  loo  =1.5E-05 


I  lEI  !«,  =1103 


I  |E|  Ijx,  =1103 


Table  VI- (B) 
MCF  Analysis  for  the  Tenth  Newton  Iteration  of  Table  V,  Run  B 


1       GMW 

S&E, 

(a) 

S&E, 

(b) 

j    1 

1    d.      d.+e. 

d. 
i 

d,+e. 

d. 

d.+e. 

14    1 

1       * 

83 

128 

94 

169 

15    1 

1  -  0.2       0.2 

0.3 

46 

3 

20 

16    1 

1    * 

12 

58 

12 

15 

17    1 

1-13       13 

22 

68 

* 

18    1 

1  -104       104 

-  0.5 

45 

-  0.6 

0.02 

||Elt^=208         ||E|loo=46      IIE||^=75 
Entries  leading  to  zero  modifications  were  not  recorded 
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Table  VII 
Molecular  Model  2 :  Water  Clusters 


I  I  Newton  PCG    F&G  I  I  I E Moo          Nonzero 

n     Run  I  I  Itns .  Itns .  Evals.  1  Min .      Max.  Med.  Itns . 

GMW  II  38  506     67  |  1.5E-05  6.6E-01  3.3E-01  38 

(27)   S&E,  (a)  II  63  1147     94  I  3.3E+02  1.6E  +  03  l.OE  +  03  63 

S&E, (b)  II  68  1053  106  I  5.0E+02  1.5E+03  l.OE+03  68 

GMW  II  70  1400  114  I  1.5E-05  1.8E-05  1.7E-05  70 

(36)   S&E, (a)  II  71  1783  110  I  3.9E+02  1.5E+03  9.5E+02  71 

S&E, (b)  II  75  1554  128  I  3 . 9E+02  1.6E+03  l.OE+03  75 

GMW  II  81  2256  148  I  1.5E-05  1.9E-05  1.7E-05  81 

(45)   S&E, (a)  II  99  3606  159  I  5.4E+02  1.6E+03  l.lE+03  99 

S&E, (b)  II  94  2598  162  I  9.6E+02  1.5E+03  1.2E+03  94 

GMW  II  116  4744  212  |  1.5E-05  1  .  9E-05  1.7E-05  116 

(54)   S&E,  (a)  II  116  5554  179  I  9.9E  +  02  1.6E+03  1.3E+03  116 

S&E, (b)  II  108  3886  158  I  l.lE+03  1.6E+03  1.4E+03  108 

GMW  II  85  4771  157  |  1.5E-05  1.9E-05  1.7E-05  85 

(72)   S&E,  (a)  II  64  3155  102  I  8.0E+02  1.6E  +  03  1.2E+03  64 

S&E, (b)  II  54  1848     89  I  8 . 4E+02  1.6E+03  1.2E+03  54 

GMW  II  112  8524  200  I  1.5E-05  1.9E-05  1.7E-05  112 

(90)   S&E, (a)  II  85  5804  133  I  l.OE+03  1.6E+03  1.3E+03  85 

S&E, (b)  II  85  4879  154  |  l.lE+03  1.6E+03  1.3E+03  85 

GMW  II  143  12941  270  I  1.5E-05  1.3E+01  6.4E+00  143 

108)   S&E,  (a)  II  101  7573  162  I  1.2E+03  1.5E+03  1.4E+03  101 

S&E, (b)  II  97  6285  174  |  l.OE+03  1.7E+03  1.4E+03  97 

GMW  II  132  15702  249  |  1.6E-05  2.3E-05  1.9E-05  132 

144)   S&E,  (a)  II  118  12217  217  |  1.2E  +  03  1.6E+03  1.4E+03  118 

S&E, (b)  II  97  8631  163  I  1.2E+03  1.6E+03  1.4E+03  97 

GMW  II  88  10314  236  I  1.5E-05  1.8E-05  1.6E-05  88 

180)   S&E, (a)  II  47  3816  104  |  1.3E+03  1.6E+03  1.5E+03  47 

S&E, (b)  II  47  2657  107  |  l.lE+03  1.7E+03  1.4E+03  47 

GMW  II  71  8173  212  I  1.4E-05  1.7E-05  1.6E-05  71 

243)   S&E,  (a)  II  56  5828  104  I  l.lE  +  03  1.6E  +  03  1.4E  +  03  56 

S&E,  (b)  II  60  4880  137  |  l.lE  +  03  1.8E  +  03  1.5E  +  03  60 

GMW  II  104  20543  277  |  1.4E-05  1.9E-05  1.7E-05  104 

324)   S&E, (a)  ||  53  4450  120  I  1.2E+03  1.7E+03  1.5E+03  53 

S&E, (b)  II  55  4581  123  I  l.lE+03  1.7E+03  1.4E+03  55 

GMW  II  100  23502  407  |  1.4E-05  1.6E-05  1.5E-05  100 

576)   S&E,  (a)  II  66  9040  195  I  1.3E  +  03  1 . 6E+03  1.5E+03  66 

S&E, (b)  II  57  7732  139  I  1.4E+03  1.7E+03  1.6E+03  57 


Schlick  - 
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Table  VIII 
Eigenvalue  Information  for  Water  Clusters 


II  XO  I  X* 

n  I  I    Amin  Amax  #NEG  I    Amin  A^ax  #NEG 

(18)   H  II  -2.4E+00  2.2E+03  3  I  9.8E-02  2.2E+03  0 

M  II  -1.6E-13  2.1E+03  1  I  -1.7E-14  2.2E+03  2 

(27)   H  II  -4.5E+01  2.3E+03  8  I  5.4E-02  2.3E+03  0 

M  II  -1.3E-13  2.3E+03  2  I  -1.2E-14  2.2E+03  2 

(36)   H  II  -4.5E+01  2.3E+03  11  I   1.5E-02  2.3E+03  0 

M  II  -1.5E-13  2.3E+03  3  I  -2.8E-14  2.2E+03  5 

(45)   H  II  -4.5E+01  2.3E+03  14  |   1.9E-04  2.3E+03  0 

M  II  -1.5E-13  2.3E+03  4  I  -2.8E-14  2.2E+03  4 

(54)   H  II  -5.6E+01  2.3E+03  17  |   9.6E-03  2.3E+03  0 

M  II  -1.5E-13  2.3E+03  7  |  -2.1E-14  2.2E+03  5 

(72)   H  II  -5.6E+01  2.3E+03  23  I   8.9E-03  2.3E+03  0 

M  II  -1.5E-13  2.3E+03  9  I  -3.6E-14  2.2E+03  6 

(90)   H  II  -l.lE+02  2.3E+03  28  I   1.2E-03  2.3E+03  0 

M  II  -1.5E-13  2.3E+03  12  I  -2.8E-14  2.2E+03  7 

(108)   H  II  -7.5E+01  2.3E+03  34  |   4.1E-03  2.3E+03  0 

M  II  -1.6E-13  2.3E+03  12  I  -2.8E-14  2.2E+03  9 

(144)   H  II  -9.6E+01  2.3E+03  45  I   8.3E-03  2.3E+03  0 

M  II  -1.5E-13  2.3E+03  17  |  -3.4E-14  2.2E+03  14 

(180)   H  II  -l.lE+01  2.4E+03  54  I   3.3E-03  2.3E+03  0 

M  II  -1.7E-13  2.3E+03  24  |  -3.3E-14  2.2E+03  16 

(243)   H  II  -l.lE+02  2.4E+03  70  I   2.6E-03  2.3E+03  0 

M  II  -2.5E-13  2.3E+03  31  I  -2.9E-14  2.2E+03  25 


-  T.  Schliclc  - 
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Table  IX 
Trigonometric  Function  Minimization 


n  =  50 


M 

Run 

1   Newton  PCG 
1    Itns.  Itns. 

F&G 
Evals . 

Min. 

1  lEI  I50 
Max. 

Nonzero 
Med.     Itns. 

(D) 

GMW 
S&E, 
S&E, 

(a)  1 

(b)  1 

1    15 
1    15 
1    22 

79 
88 
84 

22 
41 
38 

l.lE+00 
5.5E-01 
2.1E-03 

l.lE+00 
5.5E-01 
5.5E-01 

l.lE+00 
5.5E-01 
2.8E-01 

1 
1 
9 

(T) 

GMW 
S&E, 
S&E, 

(a)  1 

(b)  1 

1    22 

1    15 
1    13 

102 
108 
107 

54 
20 
18 

3.0E-02 
4.2E-02 
3.7E-02 

1.5E+00 
6.0E+00 
6.0E-01 

7.4E-01 
3.2E-01 
3.2E-01 

14 
7 
6 

(P) 

GMW 
S&E, 
S&E, 

(a)  1 

(b)  1 

1    30 

1    14 
1    13 

125 

111 
101 

52 
38 

17 

8.0E-02 
9.0E-02 
5.7E-02 

6.1E+00 
6.2E-01 
6.2E-01 

3.1E+00 
3.5E-01 
3.4E-01 

22 
6 
8 

n  =  100 


GMW  I  I  19  65  36  I  1.3E-01  l.lE+00  6.3E-01  13 

(D)    S&E,  (a)  II  47  109  83  I  3.0E-03  5.7E-01  2.8E-01  34 

S&E,  (b)  I  I  30  79  41  |  6.9E-03  5.7E-01  2.9E-01  15 

GMW  II  28  114  54  I  6.3E-02  1.7E+00  8.7E-01  19 

(T)    S&E,  (a)  II  15  98  22  I  2.0E-02  5.9E-01  3.1E-01  8 

S&E,  (b)  II  15  107  36  I  2.8E-02  5.9E-01  3.1E-01  7 

GMW  II  58  139  92  |  8.8E-02  7.7E+00  3.9E+00  51 

(P)    S&E,  (a)  I  I  14  100  93  I  4.4E-02  6.0E-01  3.2E-01  7 

S&E,  (b)  II  13  100  21  I  2.4E-02  6.0E-01  3.1E-01  6 


250 


GMW  II  35  106  91  I  l.lE-01  1.2E+00  5.8E-01  28 

(D)   S&E, (a)  II  47  134  71  |  4.3E-02  5.8E-01  2.9E-01  33 

S&E,  (b)  II  126  197  183  I  3.2E-05  5.8E-01  2.9E-01  115 

GMW  II  45  138  88  I  8.4E-03  1.4E+00  7.2E-01  36 

(T)   S&E,  (a)  II  20  109  54  |  7  .  lE-02  5.9E-01  3.0E-01  10 

S&E, (b)  II  61  136  100  I  6.1E-03  5.9E-01  3.0E-01  52 

GMW  II  50  166  106  I  1.5E-02  7.8E+00  3.9E+00  40 

(P)   S&E,  (a)  11  17  108  40  I  2.0E-02  5.9E-01  3.1E-01  8 

S&E, (b)  II  23  124  56  I  2.0E-02  5.9E-01  3.1E-01  11 
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—  Table  IX,  cont .  — 


n  =1000 


GMW  II  55  151  128  I  2.2E-03  1.7E+00  8.3E-01  49 

(D)   S&E, (a)  II  150  220  229  I  3.1E-05  5.8E-01  2.9E-01  137 

S&E, (b)  II  223  276  319  I  2.9E-03  5.8E-01  2.9E-01  215 

GMW  II  86  206  181  I  2.6E-01  1.8E+00  l.OE+00  79 

(T)   S&E, (a)  II  34  95  74  I  4.5E-03  5.8E-01  2.9E-01  26 

S&E, (b)  II  145  184  205  I  4.5E-03  5.8E-01  2.9E-01  140 

GMW  II  50  133  126  I  6.7E-02  1.9E+00  9.9E-01  46 

(P)   S&E, (a)  II  21  94  60  I  5.9E-03  5.9E-01  3.0E-01  12 

S&E, (b)  II  41  109  61  I  5.9E-03  5.9E-01  3.0E-01  33 


The  symbols  (D) ,  (T) ,  and  (P)  denote  Diagonal,  Tridiagonal,  and 
Pentadiagonal  Preconditioners,  respectively 


Table  X 
Eigenvalue  Information  for  the  Trigonometric  Function 


XO 

X* 

n 

1    Amin 

/max 

#NEG 

P^min 

Amax 

#NEG 

======= 

====  1 

====== 

========== 

(50) 

H   1 

1  -5.7E-01 

1.4E+00 

30 

5.3E-02 

2.7E+00 

0 

M   1 

1  -6.0E-01 

1.4E+00 

31 

6.4E-02 

2.2E+00 

0 

(100) 

H   1 

1  -5.8E-01 

1.5E+00 

60 

6.3E-02 

2.8E+00 

0 

M   1 

1  -5.9E-01 

1.4E+00 

61 

6.5E-02 

2.2E+00 

0 

(250) 

H   1 

1  -5.8E-01 

1.5E+00 

152 

4.5E-02 

2.8E+00 

0 

M   1 

1  -5.9E-01 

1.5E+00 

152 

6.3E-02 

2.0E+00 

0 

(1000) 

H   1 

1  -5.8E-01 

1.5E+00 

607 

4.8E-02 

2.7E+00 

0 

M   1 

1  -5.8E-01 

1.5E+00 

608 

7.6E-02 

2.0E+00 

0 
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